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Abstract 


The primary purpose of this paper is to combine optimization and machine learning to 
extract hidden rules, remove unrelated data, introduce the most productive Decision-Making 
Units (DMUs) in the optimization part, and to introduce the algorithm with the highest 
accuracy in Machine learning part. In the optimization part, Data Envelopment Analysis 
(DEA), which is a scientific modeling method of computing comparative productivities and 
efficiencies of Decision-Making Units (DMUs) compares productivities with Malmquist 
Productivity Index (MPI). We apply the DEA evaluation with the abovementioned well- 
known methods in thirteen pharmaceutical companies for five developing countries over 
2014-2019. To find the superior model, we use CCR-DEA (or Charnes, Cooper and Rhodes 
model), BBC-DEA (or Banker, Charnes and Cooper model), and Free Disposal Hull (FDH) 
for measuring the performance and efficiency of decision processes. We assess models with 
financial information from Data-stream, with Research and Development (R&D) 
investment. R&D expenditures relate to the exploration and progress of a company's 
properties or facilities. In the machine learning part, we use a specific two-layer data mining 
filtering pre-processes for clustering algorithms to increase the efficiency and to find the 
superior algorithm. The results indicate that the FDH model has the most productive results 
(in MPI) and the highest accurate algorithm (in clustering) during all periods compare with 
other suggested models. The BCC-DEA and CCR-DEA models have the second and third 
place, respectively. Meanwhile, HIERARCHICAL CLUSTERER has the highest accuracy 


among the eight proposed algorithms. 
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1. Introduction 


Growing novel drugs in the pharmaceutical industry need astrophysical assets and a hard-working group of specialists for stages 
beyond a decade. Despite these long-period and high-price savings, the danger of effective commercialization of runaway success 
products is inadequate to less than 10% [1]. DEA in optimization is a non-parametric frontier technique where the effectiveness 
of a specific entity is calculated by its distance from the highest performance practice frontier created by the most exceptional 
performance entities inside the group. DEA is a general method for assessing the efficacy of ecological-associated systems [2]. 
Speculation in R&D has been found as out of sorts donating constituent in TFP growth among the designated companies [3]. 
A recent study [4] evaluated the amount of DTE created by us-based pharmaceutical companies to find the effect of efficiency 
on companies’ productivity, and as a result, the higher the efficiency, the better the companies’ productivity. DEA window 
analysis and MPI for evaluating the efficacy of the scorching procedure in pharmaceutical manufacturing have been proposed 
by [5]. Finally, the study outcomes offer a valued response on how to progress efficiency, use properties, and efficiently 
accomplish manufacturing lines. A similar research paper applied a hybrid fuzzy MCDM technique for calculating the 
performance of public pharmaceutical companies [6,7]. In an experimental study, the efficiency of Indian pharmaceutical 
companies through the collapse period applying DEA techniques has been proposed [8]. In an estimation for technical 
efficiencies, slacks, and input/output targets of 50 large pharmaceutical companies over 2010-2011, inefficiency in the companies 
was based on unproductive decision-makers performance or low measure operation [9]. In a paper with the same topic, they 
assess the technical efficiency and productivity of 81 companies related to pharmaceutical manufacturing [10]. 
The primary purpose of this paper is based on the evaluation of productivity calculation with MPI and clustering algorithms for 
pharmaceutical companies. Both of the DEA above methods have been applied on three CCRjg (CCR Input Oriented), BCCjo 
(BCC Input Oriented), and FDH models to find the superior model. 


2. Methods 


The objective of this study is to compare companies’ productivity effectively. Using a comparative DEA with MPI is 
established to determine the features of pharmaceutical companies in terms of some DMUs with three suggested models. 
Meanwhile, we consider data mining clustering algorithms at the next step. Finally, the entire progression can be divided into 


four steps, as follows: 


2.1. FDH, CCR, BCC Models 


FDH model is a non-parametric method to measure the efficiency of production units or DMUs. FDH model relaxes the 
convexity assumption of basic DEA models. The computational technique to solve the FDH program considers the mixed integer 
programming problem compared to the DEA model with a linear programming problem [11]. 

The BCC model is representative using by VRS. It is characterized by increasing returns-to-scale (IRS), decreasing returns-to- 
scale (DRS), and constant returns to scale (CRS). The production possibility set of the FDH model is obtained by defining it 
differently with CCR and BCC models. FDH model allows the free impossibility to construct the production possibility set. 
Accordingly, the frontier line for the FDH model is developed from the observed inputs and outputs, enabling free failure. The 
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The efficiency of an assumed DMU is calculated based on the CCRyo model as follows: 
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The BCCjo is represented as follows: 
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2.2. Evaluation in Clustering 


Clustering is a foremost duty of explorative data mining, and a public procedure for numerical data analysis utilized in several 
areas, containing machine learning, pattern recognition, and bioinformatics. The numeric attributes used in the clustering 
algorithms include energy consumption [12-16] [18-26], cement production, pollution control investment, and waste material 
removed. The MPI efficiency score is the class of clustering algorithms. DMUs with the MPI status greater than one is 
characterized with "yes," and DMUs with the MPI status of less than one is designated with” no." The validity of the proposed 
method must be evaluated in each study. To confirm the validity of the proposed model and to test the authority of this research, 
data were divided into two groups, test data, and educational data in clustering algorithms. With this method, the final outputs 
are reviewed, and the validity of the research is verified. In this study, 70 percent of the data were designated as training data 
sets, and 30 percent of the data were selected as experimental data sets. The suggested clustering method aims to identify 


performance configurations within regular outlines of diverse constructing systems from raw data sets. To randomly select the 
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experimental data, the Excel software has been used. Finally, to compare and to find the superior algorithms, eight designated 
clustering algorithms in WEKA software such as CANOPY, COBWEB CLUSTERING, Make density based cluster, Expectation 


maximization, Farthest first, Filtered cluster, Hierarchical cluster, K-means are considered [17]. 


3. Discussion in the MPI Model 


3.1 Discussion in MPI -CCR Model 


The average MPI-CCR for all pharmaceutical companies over 2015-2019 is given in Table 1. 


Table 1. Productivity measurement results based on MPI-CCR for 30 companies over 2015-2019 


Companies CO MPE—CRank —CCoppnites CM MP—CRRaNKK 
1 0.94 21 16 0.48 30 
4) 1.15 16 17 1.94 1 
3 0.88 23 18 0.90 jp) 
4 1.34 12 19 1.70 7 
5 0.80 25 20 1.92 2 
6 1.63 8 21 1.04 18 
7 1.28 13 op) 0.65 28 
8 0.95 20 22 1.09 17 
9 1.87 4 24 1.16 15 
10 0.78 26 25 1.42 11 
11 1.24 14 26 0.84 24 
12 1.56 9 27 1.81 5 
13 1.48 10 28 1.77 6 
14 1.03 19 29 1.91 3 
15 0.68 o7 30 0.57 29 


3.2. Discussion in MPI-BCC Model 
The average MPI-BCC for all pharmaceutical companies over 2015-2019 is given in Table 2. 


Table 2. Productivity measurement results based on MPI-BCC for 30 companies over 2015-2019 


Companies MPI Rank Companies MPI Rank 
1 0.95 21 16 0.49 30 
2 1.16 16 17 1.95 1 
3 0.89 23 18 0.91 22 
4 1.35 12 19 1.73 7 
5 0.81 25 20 1.93 2 
6 1.64 8 21 1.05 18 
7 1.30 13 22 0.67 28 
8 0.96 20 23 1.10 17 
9 1.89 4 24 1.17 15 
10 0.79 26 25 1.45 11 
11 1.25 14 26 0.85 24 
12 1.58 9 27 1.82 5 
13 1.49 10 28 1.78 6 
14 1.04 19 29 1.92 3 
15 0.69 vse | 30 0.59 29 


3.3 Discussion in MPI-FDH Model 


The average MPI-FDH for all pharmaceutical companies over 2015-2019 is given in Table 3. 





fy enact A 








Jafari Golrokh and Hasan ENG Trans., vol. 1, pp. 1-8, November 2020 


Table 3. Productivity measurement results based on MPI-FDH for 30 companies over 2015-2019 


_Companies CO MPE—CRank ——CComppntpes CM MP—CRRaNKK 
1 0.96 21 16 0.50 30 
2 1.17 16 17 1.96 1 
3 0.90 23 18 0.92 29 
4 1.36 12 19 1.76 7 
5 0.82 25 20 1.94 9) 
6 1.65 8 21 1.06 18 
7 1.32 13 22 0.69 28 
8 0.97 20 23 1.11 17 
9 1.91 4 24 1.18 15 
10 0.80 26 25 1.48 10 
11 1.26 14 26 0.86 24 
12 1.60 9 27 1.83 5 
13 1.50 10 28 1.79 6 
14 1.05 19 29 1.93 3 
15 0.70 og 30 0.61 29 

3.4 Results 


Although the difference between productivity scores among the three suggested models is negligible, the FDH model 
has the highest rank. BCC, and CCR models are in the 2", 3™ places, respectively. Finally, the following relation is 
applicable for all DMUs in all MPIs and all years: 


FDH > BCC >CCR (13) 
3.5 Discussion in the clustering 
After applying clustering steps (step A and step B), the accuracy and average accuracy in each stage are presented in Tables 


4,5, and 6 for CCR, BCC, and FDH models, respectively. 


Table 4. Accuracy comparison contained by clustering algorithms for CCR model (All Numbers Are in Percent) 


Algorithms 
CANOPY 


COBWEB 

EXPECTATION MAXIMIZATION 
FARTHEST FIRST 

FILTERED CLUSTERER 
HIERARCHICAL CLUSTERER 
MAKE DENSITY BASED CLUSTER 
K-MEANS 

Avg. of eight algorithms accuracy 


Step A 


54.6364 
4.8182 


Step B 


D594) 
18.1818 
74.7273 
T4213 
55:5455 
82.8182 
53.9495 
55.5455 
58.9545 


Table 5. Accuracy comparison contained by clustering algorithms for BCC model (All Numbers Are in Percent) 


Algorithms 
CANOPY 


COBWEB 

EXPECTATION MAXIMIZATION 
FARTHEST FIRST 

FILTERED CLUSTERER 
HIERARCHICAL CLUSTERER 
MAKE DENSITY BASED CLUSTER 
K-MEANS 

Avg. of eight algorithms accuracy 
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Step A 


56.6364 
6.8182 


52.9091 
55.5011 


Step B 


57.5455 
20.1818 
76.7273 
TDA2ZT3 
58.5455 
86.8182 
59.5455 
56.5455 
61.4545 
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Table 6. Accuracy comparison contained by clustering algorithms for FDH model (All Numbers Are in Percent) 


Algorithms Sep A Step BO 
CANOPY 60.6364 61.5455 
COBWEB 8.8182 21.1818 
EXPECTATION MAXIMIZATION 67.0999 TEAL 
FARTHEST FIRST 74 76.7273 
FILTERED CLUSTERER 56.9091 59.5455 
HIERARCHICAL CLUSTERER 76.8182 87.8182 
MAKE DENSITY BASED CLUSTER 58.8182 64.5455 
K-MEANS 53.9091 57.5455 
Avg. of eight algorithms accuracy 57.1261 63.3295 


It can be concluded from Table 4, 5, and 6 as the layers of filtering increases: 


e The maximum of accuracy within two assessment approaches is improved. 
e The average accuracy within eight algorithms, links to each filtering step is augmented. 
e The accuracy of all algorithms is increased as well. 

HIERARCHICAL CLUSTERER at step A and B has the highest accuracy. In fact, according to our unique data, attributed, 
and instances using the HIERARCHICAL CLUSTERER algorithm in proposed combining DEA and data mining 
methodology has the highest accuracy. Finally, the FDH model has the highest rank. BCC, and CCR models are in the 2nd, 3rd 
places, respectively. Finally, the following relation is applicable for all DMUs in all MPIs and all years: 


FDH > BCC >CCR (14) 


4. Conclusion 


In this study, we describe how companies operate in the presence of similar companies. Therefore, companies that have a 
higher score can improve their productivity. The more taking available information, the higher accurate and accessible data will 
be available. Each company needs a productivity measurement to know its current status. So, productive companies are the best 
reference for increasing the efficiency of inefficient companies. The FDH model has a more positive impact on efficiency score 
compare with other suggested models such as CCR and BCC. The proposed approach, geometric average, results, and predictions 
derived from the period and productivity score can help the practitioner to compare the productivity of uncertain cases and 
instruct accordingly. In the future, applying window analysis and comparing final productivities result with MPI will be valuable. 
Meanwhile, using fuzzy and random data for window analysis will be interesting as a final comparison. Since the proposed MPI 
method is based on a moving average, it is useful for finding per efficiency trends over time. So, the results and predictions can 
be helpful for managers of these companies and other managers who benefit from this approach to achieve a higher relative 
efficiency score. Besides, managers can compare the efficiency of the current year with other similar companies over the past 
years. Finally, for the novelty of this paper, data mining clustering algorithms with unique preprocessed filtering methods 
introduce the best performing algorithm. As the future approach, we will improve the method by combining with some other 


available optimization and machine learning techniques such as but not limited to [27-45]. 
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